Squeeze: Efficient compact fractals for tensor core GPUs

نویسندگان

چکیده

This work presents Squeeze, an efficient compact fractal processing scheme for tensor core GPUs. By combining discrete-space transformations between and expanded forms, one can do data-parallel computation on a with neighborhood access without needing to expand the in memory. The space are formulated as two GPU tensor-core accelerated thread maps, ?(?) ?(?), which act compact-to-expanded expanded-to-compact functions, respectively. cost of maps is O(log2logs(n)) time, n being side n×n embedding its form, s linear scaling factor. proposed approach works any that belongs Non-overlapping-Bounding-Boxes (NBB) class discrete fractals, be extended three dimensions well. Experimental results using Sierpinski Triangle case study shows up ?12× speedup memory reduction factor ?315× respect GPU-based expanded-space bounding box approach. These show will allow scientific community efficiently tackle problems now could not fit into

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fractals Image Rendering and Compression using GPUs

Fractal image compression provides immense advantages as compared to conventional image compressions. Though the fractal image encoding time is comparatively quite high as compared to the conventional ones but the decoding time is far less and almost instantaneous. Besides, fractal images are resolution-independent, implying that these images will render the same intensity and quality even when...

متن کامل

High-Performance Tensor Contractions for GPUs

We present a computational framework for high-performance tensor contractions on GPUs. High-performance is difficult to obtain using existing libraries, especially for many independent contractions where each contraction is very small, e.g., sub-vector/warp in size. However, using our framework to batch contractions plus application-specifics, we demonstrate close to peak performance results. I...

متن کامل

Efficient softmax approximation for GPUs

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computational complexity. Our approach further reduces the computation...

متن کامل

Efficient Synchronization Primitives for GPUs

In this paper, we revisit the design of synchronization primitives— specifically barriers, mutexes, and semaphores—and how they apply to the GPU. Previous implementations are insufficient due to the discrepancies in hardware and programming model of the GPU and CPU. We create new implementations in CUDA and analyze the performance of spinning on the GPU, as well as a method of sleeping on the G...

متن کامل

Efficient Parallel RSA Decryption Algorithm for Many-core GPUs with CUDA

Cryptography is an important technique among various applications. In the telecommunication, cryptography is necessary when an untrusted medium is communicated in the network. RSA is a public-key cryptography algorithm to use a pair (N, E) as the public key and D as the private key. The N is the product of two large prime numbers p and q that are kept secret. It is very hard and no known polyno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Future Generation Computer Systems

سال: 2022

ISSN: ['0167-739X', '1872-7115']

DOI: https://doi.org/10.1016/j.future.2022.04.023